Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise
نویسندگان
چکیده
The ensemble machine learning methods incorporating random subspace and random forest employing genetic fuzzy rule-based systems as base learning algorithms were developed in Matlab environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The accuracy of ensembles generated by the proposed methods was compared with bagging, repeated holdout, and repeated cross-validation models. The tests were made for four levels of noise injected into the benchmark datasets. The analysis of the results was performed using statistical methodology including nonparametric tests followed by post-hoc procedures designed especially for multiple N×N comparisons.
منابع مشابه
Investigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data
The ensemble machine learning methods incorporating bagging, random subspace, random forest, and rotation forest employing decision trees, i.e. Pruned Model Trees, as base learning algorithms were developed in WEKA environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. T...
متن کاملComparison of Random Survival Forests for Competing Risks and Regression Models in Determining Mortality Risk Factors in Breast Cancer Patients in Mahdieh Center, Hamedan, Iran
Introduction: Breast cancer is one of the most common cancers among women worldwide. Patients with cancer may die due to disease progression or other types of events. These different event types are called competing risks. This study aimed to determine the factors affecting the survival of patients with breast cancer using three different approaches: cause-specific hazards regression, subdistri...
متن کاملInvestigation of Random Subspace and Random Forest Methods Applied to Property Valuation Data
The experiments aimed to compare the performance of random subspace and random forest models with bagging ensembles and single models in respect of its predictive accuracy were conducted using two popular algorithms M5 tree and multilayer perceptron. All tests were carried out in the WEKA data mining system within the framework of 10-fold cross-validation and repeated holdout splits. A comprehe...
متن کاملImprovement of Support Vector Machine and Random Forest Algorithm in Predicting Khorramabad River Flow Uusing Non-uniform De-Noising of data and Simplex Algorithm
In this study, in order to simulate the monthly flow of the Khorramabad River, the time series of this river was decomposed into three levels using the wavelet of Daubechies-3, during the period of 1955-2014. Based on this, it was found that there is a Non-uniform noise that includes two periods of time in this signal, with the October 2008 border which required that the signal be become non-un...
متن کاملComparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods. Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shah...
متن کامل